{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "<a href=\"https://www.bigdatauniversity.com\"><img src=\"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width=\"400\" align=\"center\"></a>\n",
    "\n",
    "<h1 align=\"center\"><font size=\"5\">Classification with Python</font></h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "In this notebook we try to practice all the classification algorithms that we learned in this course.\n",
    "\n",
    "We load a dataset using Pandas library, and apply the following algorithms, and find the best one for this specific dataset by accuracy evaluation methods.\n",
    "\n",
    "Lets first load required libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "import itertools\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from matplotlib.ticker import NullFormatter\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.ticker as ticker\n",
    "from sklearn import preprocessing\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### About dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "This dataset is about past loans. The __Loan_train.csv__ data set includes details of 346 customers whose loan are already paid off or defaulted. It includes following fields:\n",
    "\n",
    "| Field          | Description                                                                           |\n",
    "|----------------|---------------------------------------------------------------------------------------|\n",
    "| Loan_status    | Whether a loan is paid off on in collection                                           |\n",
    "| Principal      | Basic principal loan amount at the                                                    |\n",
    "| Terms          | Origination terms which can be weekly (7 days), biweekly, and monthly payoff schedule |\n",
    "| Effective_date | When the loan got originated and took effects                                         |\n",
    "| Due_date       | Since it’s one-time payoff schedule, each loan has one single due date                |\n",
    "| Age            | Age of applicant                                                                      |\n",
    "| Education      | Education of applicant                                                                |\n",
    "| Gender         | The gender of applicant                                                               |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "Lets download the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2019-10-05 09:42:43--  https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv\n",
      "Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.193\n",
      "Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.193|:443... connected.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 23101 (23K) [text/csv]\n",
      "Saving to: ‘loan_train.csv’\n",
      "\n",
      "loan_train.csv      100%[===================>]  22.56K  --.-KB/s    in 0.02s   \n",
      "\n",
      "2019-10-05 09:42:44 (1.05 MB/s) - ‘loan_train.csv’ saved [23101/23101]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget -O loan_train.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_train.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Load Data From CSV File  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>Unnamed: 0.1</th>\n",
       "      <th>loan_status</th>\n",
       "      <th>Principal</th>\n",
       "      <th>terms</th>\n",
       "      <th>effective_date</th>\n",
       "      <th>due_date</th>\n",
       "      <th>age</th>\n",
       "      <th>education</th>\n",
       "      <th>Gender</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>9/8/2016</td>\n",
       "      <td>10/7/2016</td>\n",
       "      <td>45</td>\n",
       "      <td>High School or Below</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>9/8/2016</td>\n",
       "      <td>10/7/2016</td>\n",
       "      <td>33</td>\n",
       "      <td>Bechalor</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>15</td>\n",
       "      <td>9/8/2016</td>\n",
       "      <td>9/22/2016</td>\n",
       "      <td>27</td>\n",
       "      <td>college</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>9/9/2016</td>\n",
       "      <td>10/8/2016</td>\n",
       "      <td>28</td>\n",
       "      <td>college</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>9/9/2016</td>\n",
       "      <td>10/8/2016</td>\n",
       "      <td>29</td>\n",
       "      <td>college</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  Unnamed: 0.1 loan_status  Principal  terms effective_date  \\\n",
       "0           0             0     PAIDOFF       1000     30       9/8/2016   \n",
       "1           2             2     PAIDOFF       1000     30       9/8/2016   \n",
       "2           3             3     PAIDOFF       1000     15       9/8/2016   \n",
       "3           4             4     PAIDOFF       1000     30       9/9/2016   \n",
       "4           6             6     PAIDOFF       1000     30       9/9/2016   \n",
       "\n",
       "    due_date  age             education  Gender  \n",
       "0  10/7/2016   45  High School or Below    male  \n",
       "1  10/7/2016   33              Bechalor  female  \n",
       "2  9/22/2016   27               college    male  \n",
       "3  10/8/2016   28               college  female  \n",
       "4  10/8/2016   29               college    male  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('loan_train.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(346, 10)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Convert to date time object "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>Unnamed: 0.1</th>\n",
       "      <th>loan_status</th>\n",
       "      <th>Principal</th>\n",
       "      <th>terms</th>\n",
       "      <th>effective_date</th>\n",
       "      <th>due_date</th>\n",
       "      <th>age</th>\n",
       "      <th>education</th>\n",
       "      <th>Gender</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-08</td>\n",
       "      <td>2016-10-07</td>\n",
       "      <td>45</td>\n",
       "      <td>High School or Below</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-08</td>\n",
       "      <td>2016-10-07</td>\n",
       "      <td>33</td>\n",
       "      <td>Bechalor</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>15</td>\n",
       "      <td>2016-09-08</td>\n",
       "      <td>2016-09-22</td>\n",
       "      <td>27</td>\n",
       "      <td>college</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-09</td>\n",
       "      <td>2016-10-08</td>\n",
       "      <td>28</td>\n",
       "      <td>college</td>\n",
       "      <td>female</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-09</td>\n",
       "      <td>2016-10-08</td>\n",
       "      <td>29</td>\n",
       "      <td>college</td>\n",
       "      <td>male</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  Unnamed: 0.1 loan_status  Principal  terms effective_date  \\\n",
       "0           0             0     PAIDOFF       1000     30     2016-09-08   \n",
       "1           2             2     PAIDOFF       1000     30     2016-09-08   \n",
       "2           3             3     PAIDOFF       1000     15     2016-09-08   \n",
       "3           4             4     PAIDOFF       1000     30     2016-09-09   \n",
       "4           6             6     PAIDOFF       1000     30     2016-09-09   \n",
       "\n",
       "    due_date  age             education  Gender  \n",
       "0 2016-10-07   45  High School or Below    male  \n",
       "1 2016-10-07   33              Bechalor  female  \n",
       "2 2016-09-22   27               college    male  \n",
       "3 2016-10-08   28               college  female  \n",
       "4 2016-10-08   29               college    male  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['due_date'] = pd.to_datetime(df['due_date'])\n",
    "df['effective_date'] = pd.to_datetime(df['effective_date'])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "# Data visualization and pre-processing\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "Let’s see how many of each class is in our data set "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PAIDOFF       260\n",
       "COLLECTION     86\n",
       "Name: loan_status, dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['loan_status'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "260 people have paid off the loan on time while 86 have gone into collection \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets plot some columns to underestand data better:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Solving environment: done\n",
      "\n",
      "\n",
      "==> WARNING: A newer version of conda exists. <==\n",
      "  current version: 4.5.11\n",
      "  latest version: 4.7.12\n",
      "\n",
      "Please update conda by running\n",
      "\n",
      "    $ conda update -n base -c defaults conda\n",
      "\n",
      "\n",
      "\n",
      "## Package Plan ##\n",
      "\n",
      "  environment location: /home/jupyterlab/conda/envs/python\n",
      "\n",
      "  added / updated specs: \n",
      "    - seaborn\n",
      "\n",
      "\n",
      "The following packages will be downloaded:\n",
      "\n",
      "    package                    |            build\n",
      "    ---------------------------|-----------------\n",
      "    qt-5.6.3                   |       h8bf5577_3        45.7 MB  anaconda\n",
      "    certifi-2019.9.11          |           py36_0         154 KB  anaconda\n",
      "    pyqt-5.6.0                 |           py36_2         5.4 MB  anaconda\n",
      "    matplotlib-2.2.2           |   py36hb69df0a_2         6.6 MB  anaconda\n",
      "    libgcc-7.2.0               |       h69d50b8_2         304 KB  anaconda\n",
      "    sip-4.19.13                |   py36he6710b0_0         293 KB  anaconda\n",
      "    openssl-1.1.1              |       h7b6447c_0         5.0 MB  anaconda\n",
      "    seaborn-0.9.0              |           py36_0         379 KB  anaconda\n",
      "    ------------------------------------------------------------\n",
      "                                           Total:        63.9 MB\n",
      "\n",
      "The following NEW packages will be INSTALLED:\n",
      "\n",
      "    libgcc:     7.2.0-h69d50b8_2      anaconda   \n",
      "\n",
      "The following packages will be UPDATED:\n",
      "\n",
      "    certifi:    2019.6.16-py36_1      conda-forge --> 2019.9.11-py36_0       anaconda\n",
      "    openssl:    1.1.1c-h516909a_0     conda-forge --> 1.1.1-h7b6447c_0       anaconda\n",
      "    sip:        4.19.8-py37hf484d3e_0             --> 4.19.13-py36he6710b0_0 anaconda\n",
      "\n",
      "The following packages will be DOWNGRADED:\n",
      "\n",
      "    matplotlib: 2.2.3-py37hb69df0a_0              --> 2.2.2-py36hb69df0a_2   anaconda\n",
      "    pyqt:       5.9.2-py37h05f1152_2              --> 5.6.0-py36_2           anaconda\n",
      "    qt:         5.9.6-h8703b6f_2                  --> 5.6.3-h8bf5577_3       anaconda\n",
      "    seaborn:    0.9.0-py_1            conda-forge --> 0.9.0-py36_0           anaconda\n",
      "\n",
      "\n",
      "Downloading and Extracting Packages\n",
      "qt-5.6.3             | 45.7 MB   | ##################################### | 100% \n",
      "certifi-2019.9.11    | 154 KB    | ##################################### | 100% \n",
      "pyqt-5.6.0           | 5.4 MB    | ##################################### | 100% \n",
      "matplotlib-2.2.2     | 6.6 MB    | ##################################### | 100% \n",
      "libgcc-7.2.0         | 304 KB    | ##################################### | 100% \n",
      "sip-4.19.13          | 293 KB    | ##################################### | 100% \n",
      "openssl-1.1.1        | 5.0 MB    | ##################################### | 100% \n",
      "seaborn-0.9.0        | 379 KB    | ##################################### | 100% \n",
      "Preparing transaction: done\n",
      "Verifying transaction: done\n",
      "Executing transaction: done\n"
     ]
    }
   ],
   "source": [
    "# notice: installing seaborn might takes a few minutes\n",
    "!conda install -c anaconda seaborn -y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAbQUlEQVR4nO3de5hU1Znv8e9P7NgoREVaRRC7RRRBmQZ7NF6HwEhQ420cDcYonDiHaLwcJnq8nsc48YnxQmKS4xVHDplEQEMGdEiiEiNHMfHSYIvghXhptRUQSE7UIATwPX/U7k7RVtOXqu7eXfX7PM9+ateqvdd+i67FW3vtXWspIjAzM0ubHbo7ADMzs1ycoMzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoMzMLJWcoDqJpL0kzZL0pqQlkn4v6fQC1T1G0oJC1NUVJC2SVNPdcVj3KKa2IKlC0rOSXpB0bCce5+POqrsncYLqBJIEzAeejIj9I+IwYCIwqJvi2bE7jmtWhG1hHPBqRIyKiKcKEZO1zAmqc4wF/hoRdzcWRMTbEfG/AST1knSrpOclLZP0jaR8THK2MVfSq5LuTxo4kiYkZYuBf2qsV9IukmYkdb0g6dSkfLKkn0v6L+CxfN6MpJmS7pL0RPIt+B+SY74iaWbWdndJqpW0QtK/tVDX+OQb9NIkvj75xGapVzRtQVI1cAtwoqQ6Sb1b+jxLqpd0Y/JaraTRkh6V9IakC5Jt+kh6PNn3pcZ4cxz3f2b9++RsV0UrIrwUeAEuBW7bzutTgP+VrO8E1AJVwBjgz2S+Xe4A/B44BigH3gWGAgIeBBYk+98IfC1Z3w1YCewCTAYagH4txPAUUJdj+ccc284E5iTHPhX4EDg0iXEJUJ1s1y957AUsAkYmzxcBNUB/4Elgl6T8SuC67v57eem8pQjbwmTg9mS9xc8zUA9cmKzfBiwD+gIVwAdJ+Y7A57Pqeh1Q8vzj5HE8MD15rzsAC4Djuvvv2lWLu366gKQ7yDSuv0bE35P50I2U9M/JJruSaXB/BZ6LiIZkvzqgEvgYeCsi/pCU/4xMwyap6xRJlyfPy4HByfrCiPhjrpgior395/8VESHpJWBNRLyUxLIiibEOOEvSFDINbwAwnEzDbPSFpOzp5Mvw58j8x2MlokjaQqPWPs8PJ48vAX0i4iPgI0kbJe0G/AW4UdJxwKfAQGAvYHVWHeOT5YXkeR8y/z5PdjDmHsUJqnOsAM5ofBIRF0nqT+bbIWS+DV0SEY9m7yRpDLApq2grf/sbtTRoooAzIuK1ZnUdQaYB5N5JeorMN7rmLo+I3+Qob4zr02YxfgrsKKkKuBz4+4j4U9L1V54j1oURcXZLcVnRKca2kH287X2et9tmgHPInFEdFhGbJdWTu818LyLu2U4cRcvXoDrHb4FySRdmle2ctf4ocKGkMgBJB0raZTv1vQpUSRqSPM9uEI8Cl2T1z49qS4ARcWxEVOdYttcgt+fzZP4T+LOkvYATcmzzDHC0pAOSWHeWdGAHj2c9QzG3hXw/z7uS6e7bLOmLwH45tnkU+HrWta2BkvZsxzF6NCeoThCZzuPTgH+Q9Jak54CfkOmjBvh34GVgqaTlwD1s52w2IjaS6cb4ZXJh+O2sl28AyoBlSV03FPr9tEVEvEimG2IFMAN4Osc2a8n04c+WtIxMAx/WhWFaFyvmtlCAz/P9QI2kWjJnU6/mOMZjwCzg90n3+lxyn+0VpcYLcmZmZqniMygzM0slJygzM0slJygzM0slJygzM0ulVCSoCRMmBJnfNnjxUixLwbh9eCmypc1SkaDWrVvX3SGYpZbbh5WqVCQoMzOz5pygzMwslZygzMwslTxYrJkVlc2bN9PQ0MDGjRu7O5SSVl5ezqBBgygrK+twHU5QZlZUGhoa6Nu3L5WVlSTjxloXiwjWr19PQ0MDVVVVHa7HXXxmVlQ2btzIHnvs4eTUjSSxxx575H0W6wRlJWO/AQOQlPey34AB3f1WrBVOTt2vEH8Dd/FZyXhn9Woa9hmUdz2D3m8oQDRm1hqfQZlZUSvUmXN7zqB79epFdXU1hxxyCGeeeSYbNmxoem3evHlI4tVX/zb9U319PYcccggAixYtYtddd2XUqFEcdNBBHHfccSxYsGCb+qdPn86wYcMYNmwYhx9+OIsXL256bcyYMRx00EFUV1dTXV3N3Llzt4mpcamvr8/nn7VL+AzKzIpaoc6cG7XlDLp3797U1dUBcM4553D33XfzrW99C4DZs2dzzDHHMGfOHK6//vqc+x977LFNSamuro7TTjuN3r17M27cOBYsWMA999zD4sWL6d+/P0uXLuW0007jueeeY++99wbg/vvvp6ampsWYeopWz6AkzZD0QTJDZWPZ9ZLek1SXLCdmvXa1pNclvSbpS50VuJlZT3Dsscfy+uuvA/Dxxx/z9NNPc9999zFnzpw27V9dXc11113H7bffDsDNN9/MrbfeSv/+/QEYPXo0kyZN4o477uicN9CN2tLFNxOYkKP8toioTpZfAUgaDkwERiT73CmpV6GCNTPrSbZs2cKvf/1rDj30UADmz5/PhAkTOPDAA+nXrx9Lly5tUz2jR49u6hJcsWIFhx122Dav19TUsGLFiqbn55xzTlNX3vr16wH45JNPmspOP/30Qry9TtdqF19EPCmpso31nQrMiYhNwFuSXgcOB37f4QjNzHqYxmQAmTOo888/H8h0702dOhWAiRMnMnv2bEaPHt1qfRHbHwQ8Ira5a65YuvjyuQZ1saTzgFrgsoj4EzAQeCZrm4ak7DMkTQGmAAwePDiPMMyKj9tHz5YrGaxfv57f/va3LF++HEls3boVSdxyyy2t1vfCCy9w8MEHAzB8+HCWLFnC2LFjm15funQpw4cPL+ybSIGO3sV3FzAEqAZWAd9PynPd+J4z9UfE9IioiYiaioqKDoZhVpzcPorP3LlzOe+883j77bepr6/n3Xffpaqqaps78HJZtmwZN9xwAxdddBEAV1xxBVdeeWVT111dXR0zZ87km9/8Zqe/h67WoTOoiFjTuC7pXqDxHsgGYN+sTQcB73c4OjOzPA3ee++C/nZtcHKnXHvNnj2bq666apuyM844g1mzZnHllVduU/7UU08xatQoNmzYwJ577smPf/xjxo0bB8App5zCe++9x1FHHYUk+vbty89+9jMGFOEPyNVa3yZAcg1qQUQckjwfEBGrkvV/BY6IiImSRgCzyFx32gd4HBgaEVu3V39NTU3U1tbm8z7MWiWpYD/UbUO7KdhQBm4f7fPKK680dYdZ92rhb9HmttHqGZSk2cAYoL+kBuDbwBhJ1WS67+qBbwBExApJDwIvA1uAi1pLTmZmZrm05S6+s3MU37ed7b8LfDefoMzMzDzUkZmZpZITlJmZpZITlJmZpZITlJmZpZITlJkVtX0GDS7odBv7DGp9ZI/Vq1czceJEhgwZwvDhwznxxBNZuXIlK1asYOzYsRx44IEMHTqUG264oeknCzNnzuTiiy/+TF2VlZWsW7dum7KZM2dSUVGxzfQZL7/8MgArV67kxBNP5IADDuDggw/mrLPO4oEHHmjark+fPk3TcZx33nksWrSIL3/5y011z58/n5EjRzJs2DAOPfRQ5s+f3/Ta5MmTGThwIJs2bQJg3bp1VFZWtvtv0laebsPMitqq997liOseKVh9z34n19jZfxMRnH766UyaNKlpxPK6ujrWrFnD5MmTueuuuxg/fjwbNmzgjDPO4M4772waJaI9vvKVrzSNcN5o48aNnHTSSfzgBz/g5JNPBuCJJ56goqKiaeilMWPGMG3atKax+hYtWtS0/4svvsjll1/OwoULqaqq4q233uL4449n//33Z+TIkUBmXqkZM2Zw4YUXtjvm9vIZlJlZAT3xxBOUlZVxwQUXNJVVV1ezcuVKjj76aMaPHw/AzjvvzO23385NN91UsGPPmjWLI488sik5AXzxi19smgyxNdOmTeOaa66hqqoKgKqqKq6++mpuvfXWpm2mTp3KbbfdxpYtWwoWd0ucoMzMCmj58uWfmQ4Dck+TMWTIED7++GM+/PDDdh8nu9uuurqaTz75pMVjt1VbpvIYPHgwxxxzDD/96U87fJy2chefmVkXaD4lRraWyrcnVxdfvnLFmKvsmmuu4ZRTTuGkk04q6PGb8xmUmVkBjRgxgiVLluQsbz6m4ptvvkmfPn3o27dvpx67Pfs3jzHXVB4HHHAA1dXVPPjggx0+Vls4QZmZFdDYsWPZtGkT9957b1PZ888/z9ChQ1m8eDG/+c1vgMykhpdeeilXXHFFwY791a9+ld/97nf88pe/bCp75JFHeOmll9q0/+WXX873vvc96uvrAaivr+fGG2/ksssu+8y21157LdOmTStI3C1xF5+ZFbUBA/dt9c679ta3PZKYN28eU6dO5aabbqK8vJzKykp++MMf8tBDD3HJJZdw0UUXsXXrVs4999xtbi2fOXPmNrd1P/NMZv7XkSNHssMOmfOJs846i5EjR/LAAw9sM5fUnXfeyVFHHcWCBQuYOnUqU6dOpaysjJEjR/KjH/2oTe+turqam2++mZNPPpnNmzdTVlbGLbfc0jQ7cLYRI0YwevToNk9b3xFtmm6js3k6AesKnm6jNHi6jfTId7qNVrv4JM2Q9IGk5Vllt0p6VdIySfMk7ZaUV0r6RFJdstzd1kDMzMyyteUa1Eyg+fnxQuCQiBgJrASuznrtjYioTpYLMDMz64BWE1REPAn8sVnZYxHR+CutZ8hM7W5mlgppuHRR6grxNyjEXXxfB36d9bxK0guS/q+kY1vaSdIUSbWSateuXVuAMMyKh9tHx5WXl7N+/XonqW4UEaxfv57y8vK86snrLj5J15KZ2v3+pGgVMDgi1ks6DJgvaUREfOZn0hExHZgOmYvA+cRhVmzcPjpu0KBBNDQ04MTevcrLyxk0KL/OtQ4nKEmTgC8D4yL5qhIRm4BNyfoSSW8ABwK+BcnMukRZWVnTWHLWs3Woi0/SBOBK4JSI2JBVXiGpV7K+PzAUeLMQgZqZWWlp9QxK0mxgDNBfUgPwbTJ37e0ELEzGaHomuWPvOOA7krYAW4ELIuKPOSs2MzPbjlYTVEScnaP4vha2/QXwi3yDMjMz81h8ZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSk5QZmaWSq0mKEkzJH0gaXlWWT9JCyX9IXncPeu1qyW9Luk1SV/qrMDNzKy4teUMaiYwoVnZVcDjETEUeDx5jqThwERgRLLPnY0z7JqZmbVHqwkqIp4Ems+Keyrwk2T9J8BpWeVzImJTRLwFvA4cXqBYzcyshHT0GtReEbEKIHncMykfCLybtV1DUvYZkqZIqpVUu3bt2g6GYVac3D7MCn+ThHKURa4NI2J6RNRERE1FRUWBwzDr2dw+zDqeoNZIGgCQPH6QlDcA+2ZtNwh4v+PhmZlZqepognoYmJSsTwIeyiqfKGknSVXAUOC5/EI0M7NStGNrG0iaDYwB+ktqAL4N3AQ8KOl84B3gTICIWCHpQeBlYAtwUURs7aTYzcysiLWaoCLi7BZeGtfC9t8FvptPUGZmZh5JwszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUskJyszMUqnV0cxbIukg4IGsov2B64DdgP8ONM5TfU1E/KrDEZqZWUnqcIKKiNeAagBJvYD3gHnAfwNui4hpBYnQzMxKUqG6+MYBb0TE2wWqz8zMSlyhEtREYHbW84slLZM0Q9LuuXaQNEVSraTatWvX5trErGS5fZgVIEFJ+hxwCvDzpOguYAiZ7r9VwPdz7RcR0yOiJiJqKioq8g3DrKi4fZgV5gzqBGBpRKwBiIg1EbE1Ij4F7gUOL8AxzMysxBQiQZ1NVveepAFZr50OLC/AMczMrMR0+C4+AEk7A8cD38gqvkVSNRBAfbPXzMzM2iSvBBURG4A9mpWdm1dEZmZmeCQJMzNLKScoMzNLJScoMzNLJScoMzNLJScoMzNLJScoMzNLpbxuMzfrSdSrjEHvNxSkHjPrfE5QVjJi62aOuO6RvOt59jsTChCNmbXGXXxmZpZKTlBmZpZKTlBmZpZKTlBmZpZKTlBmZpZKTlBmZpZK+c4HVQ98BGwFtkREjaR+wANAJZn5oM6KiD/lF6aZmZWaQpxBfTEiqiOiJnl+FfB4RAwFHk+eWwnab8AAJOW97DdgQOsHM7Oi0xk/1D0VGJOs/wRYBFzZCcexlHtn9Woa9hmUdz2FGP3BzHqefM+gAnhM0hJJU5KyvSJiFUDyuGeuHSVNkVQrqXbt2rV5hmFWXNw+zPJPUEdHxGjgBOAiSce1dceImB4RNRFRU1FRkWcYZsXF7cMszwQVEe8njx8A84DDgTWSBgAkjx/kG6SZmZWeDicoSbtI6tu4DowHlgMPA5OSzSYBD+UbpJmZlZ58bpLYC5gnqbGeWRHxiKTngQclnQ+8A5yZf5hmZlZqOpygIuJN4O9ylK8HxuUTlJmZmUeSMDOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjOzVHKCMjMrcYWa/brQM2B3xoy6ZmbWgxRq9mso7AzYPoMyM7NUymc+qH0lPSHpFUkrJP2PpPx6Se9JqkuWEwsXrpmZlYp8uvi2AJdFxNJk4sIlkhYmr90WEdPyD8/MzEpVPvNBrQJWJesfSXoFGFiowMzMrLQV5BqUpEpgFPBsUnSxpGWSZkjavYV9pkiqlVS7du3aQoRhVjTcPswKkKAk9QF+AUyNiA+Bu4AhQDWZM6zv59ovIqZHRE1E1FRUVOQbhllRcfswyzNBSSojk5zuj4j/BIiINRGxNSI+Be4FDs8/TDMzKzX53MUn4D7glYj4QVZ59q+0TgeWdzw8MzMrVfncxXc0cC7wkqS6pOwa4GxJ1UAA9cA38orQzMxKUj538S0GlOOlX3U8HDMzswyPJGFmZqnksfis06hXWUHG5VKvsgJEY2Y9jROUdZrYupkjrnsk73qe/c6EAkRjZj2Nu/jMzCyVnKDMzCyVnKDMzCyVnKDMzCyVnKDMzLpYoaZYL+T06mnku/jMzLpYoaZYL+T06mnkMygzM0slJygzM0sld/GZmZW4Qo360lhXoThBmZmVuEKN+gKFHfnFXXxmZpZKnZagJE2Q9Jqk1yVdlW99vi3TzKy0dEoXn6RewB3A8UAD8LykhyPi5Y7W6dsyzcxKS2ddgzoceD0i3gSQNAc4Fehwgkqb/QYM4J3Vq/OuZ/Dee/P2qlUFiKi4SbnmxrQ0cttoXaFuStihV1lRtw1FROErlf4ZmBAR/5I8Pxc4IiIuztpmCjAleXoQ8FrBA2m7/sC6bjx+Phx712tL3OsiosNXi1PUPnrq3wgce3dpLfY2t43OOoPKldK3yYQRMR2Y3knHbxdJtRFR091xdIRj73pdEXda2kdP/RuBY+8uhYy9s26SaAD2zXo+CHi/k45lZmZFqLMS1PPAUElVkj4HTAQe7qRjmZlZEeqULr6I2CLpYuBRoBcwIyJWdMaxCqTbu1Ly4Ni7Xk+NuyN68nt17N2jYLF3yk0SZmZm+fJIEmZmlkpOUGZmlkolk6Ak9ZL0gqQFyfN+khZK+kPyuHvWtlcnQzS9JulL3Rc1SNpN0lxJr0p6RdKRPSj2f5W0QtJySbMllac1dkkzJH0gaXlWWbtjlXSYpJeS136sHvArSreNbondbaMtbSMiSmIBvgXMAhYkz28BrkrWrwJuTtaHAy8COwFVwBtAr26M+yfAvyTrnwN26wmxAwOBt4DeyfMHgclpjR04DhgNLM8qa3eswHPAkWR+C/hr4ITu+uy04727bXRt3G4bbWwb3d44uugfeBDwODA2qxG+BgxI1gcAryXrVwNXZ+37KHBkN8X9+eSDrGblPSH2gcC7QD8yd4suAManOXagslkjbFesyTavZpWfDdzTHf/+7XjPbhtdH7vbRhvbRql08f0QuAL4NKtsr4hYBZA87pmUN354GjUkZd1hf2At8H+SLph/l7QLPSD2iHgPmAa8A6wC/hwRj9EDYs/S3lgHJuvNy9PMbaOLuW1sU75dRZ+gJH0Z+CAilrR1lxxl3XUv/o5kTq3viohRwF/InE63JDWxJ33Sp5I5zd8H2EXS17a3S46ytP4GoqVYe9J7cNtw2+gMBW0bRZ+ggKOBUyTVA3OAsZJ+BqyRNAAgefwg2T5NwzQ1AA0R8WzyfC6ZRtkTYv9H4K2IWBsRm4H/BI6iZ8TeqL2xNiTrzcvTym2je7httPE9FH2CioirI2JQRFSSGXLptxHxNTJDL01KNpsEPJSsPwxMlLSTpCpgKJmLe10uIlYD70o6KCkaR2bKktTHTqb74guSdk7u1hkHvELPiL1Ru2JNujo+kvSF5D2fl7VP6rhtuG3koWvaRndcJOyuBRjD3y4E70Hm4vAfksd+WdtdS+buk9fo5ruwgGqgFlgGzAd270Gx/xvwKrAc+CmZO3tSGTswm8z1gM1kvu2d35FYgZrk/b4B3E6zi/hpXdw2ujx2t402tA0PdWRmZqlU9F18ZmbWMzlBmZlZKjlBmZlZKjlBmZlZKjlBmZlZKjlBpZikrZLqkhGPfy5p5xa2+5Wk3TpQ/z6S5uYRX72k/h3d36yj3DZKg28zTzFJH0dEn2T9fmBJRPwg63WR+Rt+2lIdnRxfPVATEeu64/hWutw2SoPPoHqOp4ADJFUqM/fNncBSYN/Gb2tZr92rzFwzj0nqDSDpAEm/kfSipKWShiTbL09enyzpIUmPJPO4fLvxwJLmS1qS1DmlW969WcvcNoqUE1QPIGlH4ATgpaToIOA/ImJURLzdbPOhwB0RMQL4f8AZSfn9SfnfkRn3a1WOQx0OnEPmF/pnSqpJyr8eEYeR+SX4pZL2KNBbM8uL20Zxc4JKt96S6sgM5/IOcF9S/nZEPNPCPm9FRF2yvgSolNQXGBgR8wAiYmNEbMix78KIWB8Rn5AZwPKYpPxSSS8Cz5AZCHJo3u/MLD9uGyVgx+4OwLbrk4iozi7IdK3zl+3ssylrfSvQm9xD3efS/IJkSBpDZvTlIyNig6RFQHkb6zPrLG4bJcBnUCUgIj4EGiSdBpCMNJzrrqfjJfVL+uZPA54GdgX+lDTAYcAXuixws07mtpFuTlCl41wy3RHLgN8Be+fYZjGZkZXrgF9ERC3wCLBjst8NZLoyzIqJ20ZK+TZzAzJ3KpG5Lfbi7o7FLE3cNrqPz6DMzCyVfAZlZmap5DMoMzNLJScoMzNLJScoMzNLJScoMzNLJScoMzNLpf8PgN7OMX/lNXoAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x216 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)\n",
    "g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
    "g.map(plt.hist, 'Principal', bins=bins, ec=\"k\")\n",
    "\n",
    "g.axes[-1].legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAZQElEQVR4nO3de5BU5bnv8e9PmGRQiIoMMjpyUVFEJQPOViNqEJTD9orHrXHHKNbxhKNBDRU93nLKSrZVxlupyfESSbSgoqBus0E3qWgQnRMx3gBHBDHq0UFHucdEOQhBfM4fvZgMMDA9w+rp1T2/T9Wq7vX26nc9L9MvT693rX6XIgIzM7Os2a3YAZiZmbXGCcrMzDLJCcrMzDLJCcrMzDLJCcrMzDLJCcrMzDLJCSplkvaVNF3S+5IWSHpJ0tkp1T1K0uw06uoMkuol1RU7Diu+cuoXkqokvSLpdUknFHA/6wpVd6lwgkqRJAGzgD9GxIERcRRwPlBTpHi6F2O/Zi2VYb8YA7wdEcMj4oU0YrLWOUGlazTw94j45ZaCiFgWEf8bQFI3SbdLek3SIkn/IykflRxtPCHpbUmPJJ0aSeOSsnnAf91Sr6Q9JD2U1PW6pLOS8osl/buk/wT+sCuNkTRV0v2Snk+++X472edSSVNbbHe/pPmSlkj66Q7qGpt8a16YxNdzV2KzklI2/UJSLXAbcKqkBkk9dvTZltQo6ebktfmSRkh6RtL/lXRpsk1PSXOT9765Jd5W9vs/W/z7tNrHylJEeElpAa4E7trJ6xOB/5U8/zowHxgEjAL+Ru4b5W7AS8DxQCXwETAYEPA4MDt5/83A95LnewHvAHsAFwNNQO8dxPAC0NDKcnIr204FHk32fRbwGXBkEuMCoDbZrnfy2A2oB4Yl6/VAHdAH+COwR1J+LXBjsf9eXjpnKcN+cTFwT/J8h59toBG4LHl+F7AI6AVUAauS8u7AN1rU9R6gZH1d8jgWmJK0dTdgNnBisf+unbF4CKiAJN1LrkP9PSL+idwHbZikf0k22ZNcJ/s78GpENCXvawAGAuuADyLi3aT8YXKdmaSuMyVdnaxXAv2T53Mi4i+txRQR7R0z/8+ICElvAisj4s0kliVJjA3AeZImkuts1cBQcp1xi2OTsheTL8BfI/efjXVBZdIvtmjrs/1U8vgm0DMiPgc+l7RB0l7A/wNulnQi8BWwP7AvsKJFHWOT5fVkvSe5f58/djDmkuEEla4lwDlbViJikqQ+5L4RQu4b0BUR8UzLN0kaBWxsUbSZf/xtdjRZooBzIuLP29R1DLkPfetvkl4g9y1uW1dHxLOtlG+J66ttYvwK6C5pEHA18E8R8Wky9FfZSqxzIuJfdxSXlbVy7Bct97ezz/ZO+w9wAbkjqqMiYpOkRlrvPz+LiAd2EkdZ8jmodD0HVEq6rEXZ7i2ePwNcJqkCQNIhkvbYSX1vA4MkHZSst+wEzwBXtBiTH55PgBFxQkTUtrLsrBPuzDfIdfy/SdoX+OdWtnkZGCnp4CTW3SUd0sH9Wekp536xq5/tPckN922SdBIwoJVtngH+W4tzW/tL6tuOfZQsJ6gURW7AeDzwbUkfSHoVmEZuXBrg18BbwEJJi4EH2MlRbERsIDd08bvkZPCyFi/fBFQAi5K6bkq7PfmIiDfIDT0sAR4CXmxlm9Xkxu1nSFpErlMP6cQwrYjKuV+k8Nl+BKiTNJ/c0dTbrezjD8B04KVkqP0JWj/aKztbTsaZmZllio+gzMwsk5ygzMwsk5ygzMwsk5ygzMwskzo1QY0bNy7I/X7Bi5eusHSI+4mXLri0qlMT1Jo1azpzd2Ylyf3ELMdDfGZmlklOUGZmlklOUGZmlkmeLNbMyt6mTZtoampiw4YNxQ6lS6usrKSmpoaKioq8tneCMrOy19TURK9evRg4cCDJPLLWySKCtWvX0tTUxKBBg/J6j4f4zKzsbdiwgX322cfJqYgksc8++7TrKNYJqggGVFcjKZVlQHV1sZtjVhKcnIqvvX8DD/EVwYcrVtC0X00qddV80pRKPWZmWeMjKDPrctIcxch3JKNbt27U1tZyxBFHcO6557J+/frm12bOnIkk3n77H7eDamxs5IgjjgCgvr6ePffck+HDh3PooYdy4oknMnv27K3qnzJlCkOGDGHIkCEcffTRzJs3r/m1UaNGceihh1JbW0ttbS1PPPHEVjFtWRobG3flnzV1PoIysy4nzVEMyG8ko0ePHjQ0NABwwQUX8Mtf/pIf/ehHAMyYMYPjjz+eRx99lJ/85Cetvv+EE05oTkoNDQ2MHz+eHj16MGbMGGbPns0DDzzAvHnz6NOnDwsXLmT8+PG8+uqr9OvXD4BHHnmEurq6HcaURT6CMjPrZCeccALvvfceAOvWrePFF1/kwQcf5NFHH83r/bW1tdx4443cc889ANx6663cfvvt9OnTB4ARI0YwYcIE7r333sI0oJM4QZmZdaIvv/yS3//+9xx55JEAzJo1i3HjxnHIIYfQu3dvFi5cmFc9I0aMaB4SXLJkCUcdddRWr9fV1bFkyZLm9QsuuKB5KG/t2rUAfPHFF81lZ599dhrNS5WH+MzMOsGWZAC5I6hLLrkEyA3vTZ48GYDzzz+fGTNmMGLEiDbri9jhJODNr7e8aq4Uh/jySlCSGoHPgc3AlxFRJ6k38BgwEGgEzouITwsTpplZaWstGaxdu5bnnnuOxYsXI4nNmzcjidtuu63N+l5//XUOO+wwAIYOHcqCBQsYPXp08+sLFy5k6NCh6Taik7VniO+kiKiNiC0p+DpgbkQMBuYm62ZmlqcnnniCiy66iGXLltHY2MhHH33EoEGDtroCrzWLFi3ipptuYtKkSQBcc801XHvttc1Ddw0NDUydOpUf/OAHBW9DIe3KEN9ZwKjk+TSgHrh2F+MxMyu4/v36pfobwv7JlXLtNWPGDK67buvv9ueccw7Tp0/n2mu3/u/0hRdeYPjw4axfv56+ffvyi1/8gjFjxgBw5pln8vHHH3PcccchiV69evHwww9TXeI/5Fdb45gAkj4APiV358MHImKKpL9GxF4ttvk0IvZu5b0TgYkA/fv3P2rZsmWpBV+qJKX6Q918/oZWFHn/bN79pLCWLl3aPBxmxbWDv0WrfSXfIb6RETEC+GdgkqQT8w0mIqZERF1E1FVVVeX7NrMuxf3EbHt5JaiI+CR5XAXMBI4GVkqqBkgeVxUqSDMz63raTFCS9pDUa8tzYCywGHgKmJBsNgF4slBBmplZ15PPRRL7AjOT6+m7A9Mj4mlJrwGPS7oE+BA4t3BhmplZV9NmgoqI94FvtlK+FhhTiKDMzMw81ZGZmWWSE5SZdTn71fRP9XYb+9X0b3OfK1as4Pzzz+eggw5i6NChnHrqqbzzzjssWbKE0aNHc8ghhzB48GBuuumm5p+OTJ06lcsvv3y7ugYOHMiaNWu2Kps6dSpVVVVb3T7jrbfeAuCdd97h1FNP5eCDD+awww7jvPPO47HHHmvermfPns2347jooouor6/n9NNPb6571qxZDBs2jCFDhnDkkUcya9as5tcuvvhi9t9/fzZu3AjAmjVrGDhwYLv/Jq3xXHx5GlBdzYcrVhQ7DDNLwfKPP+KYG59Orb5X/m3cTl+PCM4++2wmTJjQPGN5Q0MDK1eu5OKLL+b+++9n7NixrF+/nnPOOYf77ruveZaI9vjOd77TPMP5Fhs2bOC0007jzjvv5IwzzgDg+eefp6qqqnnqpVGjRnHHHXc0z9VXX1/f/P433niDq6++mjlz5jBo0CA++OADTjnlFA488ECGDRsG5O4r9dBDD3HZZZe1O+adcYLKk++Ca2Yd9fzzz1NRUcGll17aXFZbW8uDDz7IyJEjGTt2LAC7774799xzD6NGjepQgmrN9OnT+da3vtWcnABOOumkvN9/xx13cMMNNzBo0CAABg0axPXXX8/tt9/Ob37zGwAmT57MXXfdxfe///1UYt7CQ3xmZgW2ePHi7W6HAa3fJuOggw5i3bp1fPbZZ+3eT8thu9raWr744osd7jtf+dzKo3///hx//PHNCSstPoIyMyuSbW+J0dKOynemtSG+XdVajK2V3XDDDZx55pmcdtppqe3bR1BmZgV2+OGHs2DBglbL58+fv1XZ+++/T8+ePenVq1dB992e928bY2u38jj44IOpra3l8ccf7/C+tuUEZWZWYKNHj2bjxo386le/ai577bXXGDx4MPPmzePZZ58Fcjc1vPLKK7nmmmtS2/d3v/td/vSnP/G73/2uuezpp5/mzTffzOv9V199NT/72c9obGwEoLGxkZtvvpmrrrpqu21//OMfc8cdd6QSN3iIz8y6oOr9D2jzyrv21rczkpg5cyaTJ0/mlltuobKykoEDB3L33Xfz5JNPcsUVVzBp0iQ2b97MhRdeuNWl5VOnTt3qsu6XX34ZgGHDhrHbbrljjPPOO49hw4bx2GOPbXUvqfvuu4/jjjuO2bNnM3nyZCZPnkxFRQXDhg3j5z//eV5tq62t5dZbb+WMM85g06ZNVFRUcNtttzXfHbilww8/nBEjRuR92/q25HW7jbTU1dXFtoeKpSLtW2T4dhtdQvtPIlDa/SSrfLuN7CjE7TbMzMw6lROUmZllkhOUmXUJHgovvvb+DZygzKzsVVZWsnbtWiepIooI1q5dS2VlZd7v8VV8Zlb2ampqaGpqYvXq1cUOpUurrKykpib/C8ScoErc1+nYL85b079fP5YtX55KXWZZUlFR0TyXnJUOJ6gStxE8ia2ZlaW8z0FJ6ibpdUmzk/XekuZIejd53LtwYZqZWVfTnoskfggsbbF+HTA3IgYDc5N1MzOzVOSVoCTVAKcBv25RfBYwLXk+DRifbmhmZtaV5XsEdTdwDfBVi7J9I2I5QPLYt7U3Spooab6k+b6Cxqx17idm22szQUk6HVgVER2arz0ipkREXUTUVVVVdaQKs7LnfmK2vXyu4hsJnCnpVKAS+Iakh4GVkqojYrmkamBVIQM1M7Oupc0jqIi4PiJqImIgcD7wXER8D3gKmJBsNgF4smBRmplZl7MrUx3dApwi6V3glGTdzMwsFe36oW5E1AP1yfO1wJj0QzIzM/NksWZmllFOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlklOUGZmlkltJihJlZJelfSGpCWSfpqU95Y0R9K7yePehQ/XzMy6inyOoDYCoyPim0AtME7SscB1wNyIGAzMTdbNzMxS0WaCipx1yWpFsgRwFjAtKZ8GjC9IhGZm1iXldQ5KUjdJDcAqYE5EvALsGxHLAZLHvjt470RJ8yXNX716dVpxm5UV9xOz7eWVoCJic0TUAjXA0ZKOyHcHETElIuoioq6qqqqjcZqVNfcTs+216yq+iPgrUA+MA1ZKqgZIHlelHp2ZmXVZ+VzFVyVpr+R5D+Bk4G3gKWBCstkE4MlCBWlmZl1P9zy2qQamSepGLqE9HhGzJb0EPC7pEuBD4NwCxmlmZl1MmwkqIhYBw1spXwuMKURQZmZmnknCzMwyyQnKzMwyyQnKzMwyyQnKzMwyqawT1IDqaiSlspiZWefK5zLzkvXhihU07VeTSl01nzSlUo+ZmeWnrI+gzMysdDlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJjlBmZlZJrWZoCQdIOl5SUslLZH0w6S8t6Q5kt5NHvcufLhmZtZV5HME9SVwVUQcBhwLTJI0FLgOmBsRg4G5ybqZmVkq2kxQEbE8IhYmzz8HlgL7A2cB05LNpgHjCxWkmZl1Pe06ByVpIDAceAXYNyKWQy6JAX138J6JkuZLmr969epdi9asTLmfmG0v7wQlqSfwW2ByRHyW7/siYkpE1EVEXVVVVUdiNCt77idm28srQUmqIJecHomI/0iKV0qqTl6vBlYVJkQzM+uK8rmKT8CDwNKIuLPFS08BE5LnE4An0w/POtPXYae3vW/PMqC6utjNMbMSl88t30cCFwJvSmpIym4AbgEel3QJ8CFwbmFCtM6yEWjaryaVumo+aUqlHjPrutpMUBExD9AOXh6TbjjZpW4Vqf2nq+5fS6+ubhWp1GNmljX5HEEZEJs3ccyNT6dS1yv/Ni7VuszMypGnOjIzs0xygjIzs0xygjIzs0xygjIzs0xygjIzs0xygjIzs0xygjIzs0xygjIzs0xygjIzs0wq65kk0pyeyMzMOldZJ6i0pycyM7PO4yE+MzPLJCcoMzPLJCcoMzPLpLI+B9UVpHqfKt9byjJkQHU1H65YkUpdPXbrxhdfbU6lrv79+rFs+fJU6rKdc4Iqcb4QxMrVhytWpHqHZ98tuvS0OcQn6SFJqyQtblHWW9IcSe8mj3sXNkwzM+tq8jkHNRXY9qv1dcDciBgMzE3WzZp9HZCUyjKgurrYzTGzImhziC8i/ihp4DbFZwGjkufTgHrg2hTjshK3ETykYma7pKNX8e0bEcsBkse+O9pQ0kRJ8yXNX716dQd3Z1beyqWfDKiuTu3I2azgF0lExBRgCkBdXV0Uen9mpahc+knaFzZY19bRI6iVkqoBksdV6YVkZmbW8QT1FDAheT4BeDKdcMzMzHLyucx8BvAScKikJkmXALcAp0h6FzglWTczM0tNPlfx/esOXhqTcixmZmbNMjcXn68CMjMzyOBUR74KyMzMIIMJyorHE8+aWZY4QVkzTzxrZlmSuXNQZmZm4ARlZmYZ5QRlZmaZ5ARlZmaZ5ARlmed7SxWWf3toWeWr+CzzfG+pwvJvDy2rnKCsIPybKjPbVU5QVhD+TZWZ7SqfgzIzs0zyEZRlXprDhbt1q0jtZH7/fv1Ytnx5KnWVi1SHdrt/zcPE7TCgupoPV6xIpa6sfLadoCzz0h4u9AUBhZP238rDxPkrx4tdPMRnZmaZlLkjqDSHCMzMrHRlLkH56i8zM4NdTFCSxgE/B7oBv46IW1KJyqxAyuX3WWmeELf2SfNCm926V/DVl5tSqascdThBSeoG3AucAjQBr0l6KiLeSis4s7SVyxF6OZ4QLxVf+aKdTrMrF0kcDbwXEe9HxN+BR4Gz0gnLzMy6OkVEx94o/QswLiL+e7J+IXBMRFy+zXYTgYnJ6qHAnzse7lb6AGtSqisL3J7s6mhb1kREXodZ7id5c3uyLdW+sivnoFobhN0u20XEFGDKLuyn9Z1L8yOiLu16i8Xtya7OaIv7SX7cnmxLuz27MsTXBBzQYr0G+GTXwjEzM8vZlQT1GjBY0iBJXwPOB55KJywzM+vqOjzEFxFfSroceIbcZeYPRcSS1CJrW+rDIUXm9mRXKbellGNvjduTbam2p8MXSZiZmRWS5+IzM7NMcoIyM7NMynyCknSApOclLZW0RNIPk/LekuZIejd53LvYseZDUqWkVyW9kbTnp0l5SbZnC0ndJL0uaXayXrLtkdQo6U1JDZLmJ2WZb4/7Sva5n7RP5hMU8CVwVUQcBhwLTJI0FLgOmBsRg4G5yXop2AiMjohvArXAOEnHUrrt2eKHwNIW66XenpMiorbFbzpKoT3uK9nnftIeEVFSC/Akufn//gxUJ2XVwJ+LHVsH2rI7sBA4ppTbQ+43cHOB0cDspKyU29MI9NmmrOTa476SrcX9pP1LKRxBNZM0EBgOvALsGxHLAZLHvsWLrH2Sw/wGYBUwJyJKuj3A3cA1wFctykq5PQH8QdKCZAoiKLH2uK9kkvtJO2XuflA7Iqkn8FtgckR8ltZ098UQEZuBWkl7ATMlHVHsmDpK0unAqohYIGlUseNJyciI+ERSX2COpLeLHVB7uK9kj/tJx5TEEZSkCnId7pGI+I+keKWk6uT1anLfsEpKRPwVqAfGUbrtGQmcKamR3Iz2oyU9TOm2h4j4JHlcBcwkN3N/SbTHfSWz3E86IPMJSrmvfw8CSyPizhYvPQVMSJ5PIDfennmSqpJvg0jqAZwMvE2Jticiro+ImogYSG66q+ci4nuUaHsk7SGp15bnwFhgMSXQHveV7HI/6aBin2jL40Tc8eTGOhcBDclyKrAPuROO7yaPvYsda57tGQa8nrRnMXBjUl6S7dmmbaP4x8nfkmwPcCDwRrIsAX5cKu1xXymNxf0k/8VTHZmZWSZlfojPzMy6JicoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCcoMzPLJCeoMiBpVjJh45ItkzZKukTSO5LqJf1K0j1JeZWk30p6LVlGFjd6s87jvlJa/EPdMiCpd0T8JZkO5jXgvwAvAiOAz4HngDci4nJJ04H7ImKepP7AM5G7f5BZ2XNfKS0lM5u57dSVks5Onh8AXAj8n4j4C4CkfwcOSV4/GRjaYobrb0jqFRGfd2bAZkXivlJCnKBKXDJ1/8nAtyJivaR6cjcN29E3vd2Sbb/onAjNssF9pfT4HFTp2xP4NOlwQ8jd6nt34NuS9pbUHTinxfZ/AC7fsiKptlOjNSse95US4wRV+p4GuktaBNwEvAx8DNxM7m6qzwJvAX9Ltr8SqJO0SNJbwKWdH7JZUbivlBhfJFGmJPWMiHXJt8KZwEMRMbPYcZlljftKdvkIqnz9RFIDufvofADMKnI8ZlnlvpJRPoIyM7NM8hGUmZllkhOUmZllkhOUmZllkhOUmZllkhOUmZll0v8HzHNLcwZAb84AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x216 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "bins = np.linspace(df.age.min(), df.age.max(), 10)\n",
    "g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
    "g.map(plt.hist, 'age', bins=bins, ec=\"k\")\n",
    "\n",
    "g.axes[-1].legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "# Pre-processing:  Feature selection/extraction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Lets look at the day of the week people get the loan "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAGepJREFUeJzt3XmcVPW55/HPV2gvIriC2tIBWkQQldtgR+OCQUh4EdzwuoTEKGTMdTQuYQyDSzImN84YF8YlcSVq8EbEhUTMJTcaVIjgztKCiCFebbEVFJgYYxQFfeaPOt1poKGr6VPU6erv+/WqV1edOud3ntNdTz91fnXq91NEYGZmljU7FDsAMzOzprhAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlAmZlZJrlApUTS3pLuk/S6pAWSnpV0ckptD5U0M422tgdJcyRVFzsOK75SygtJ3SU9L2mRpCEF3M+HhWq7rXGBSoEkATOApyJiv4g4FBgDVBQpno7F2K9ZYyWYF8OBVyNiUETMTSMm2zoXqHQMAz6NiNvrF0TEmxHxcwBJHSRdJ+lFSYsl/fdk+dDkbGO6pFclTU2SGkkjk2XzgH+pb1fSzpLuTtpaJOmkZPk4SQ9J+g/gD605GElTJN0maXbyzvfLyT6XSZrSaL3bJM2XtFTSv22hrRHJu+aFSXxdWhObtSklkxeSqoBrgVGSaiTttKXXtqRaSVclz82XNFjSY5L+S9K5yTpdJD2RbLukPt4m9vs/G/1+msyxkhYRvrXyBlwE3LCV588Bfpjc/ydgPlAJDAX+Su4d5Q7As8DRQCfgLaAvIOBBYGay/VXAt5L7uwHLgZ2BcUAdsMcWYpgL1DRx+0oT604B7k/2fRLwAXBIEuMCoCpZb4/kZwdgDjAweTwHqAa6AU8BOyfLLwGuKPbfy7ftcyvBvBgH3Jzc3+JrG6gFzkvu3wAsBroC3YH3kuUdgV0atfUaoOTxh8nPEcDk5Fh3AGYCxxT777o9b+4KKgBJt5BLqE8j4ovkXmgDJZ2arLIruST7FHghIuqS7WqA3sCHwBsR8edk+b3kkpmkrRMlTUgedwJ6JvdnRcT/ayqmiGhpn/l/RERIWgK8GxFLkliWJjHWAKdLOodcspUDA8glY70vJcueTt4A70jun421QyWSF/Wae23/Nvm5BOgSEX8D/iZpnaTdgL8DV0k6Bvgc6AHsDaxq1MaI5LYoedyF3O/nqW2Muc1xgUrHUuCU+gcRcb6kbuTeEULuHdCFEfFY440kDQU+abToM/7xN9nSIIkCTomIP23S1uHkXvRNbyTNJfcublMTIuLxJpbXx/X5JjF+DnSUVAlMAL4YEX9Juv46NRHrrIj4xpbispJWinnReH9be21vNX+AM8idUR0aEesl1dJ0/vw0Iu7YShwlzZ9BpeNJoJOk8xot69zo/mPAeZLKACQdIGnnrbT3KlApqU/yuHESPAZc2KhPflA+AUbEkIioauK2tSTcml3IJf5fJe0NfK2JdZ4DjpK0fxJrZ0kHbOP+rO0p5bxo7Wt7V3LdfeslHQv0amKdx4D/1uizrR6S9mrBPto8F6gURK7DeDTwZUlvSHoBuIdcvzTAncArwEJJLwN3sJWz14hYR67r4nfJh8FvNnr6SqAMWJy0dWXax5OPiHiJXNfDUuBu4Okm1llNrt9+mqTF5JK6/3YM04qolPMihdf2VKBa0nxyZ1OvNrGPPwD3Ac8mXe3Tafpsr2TVfyhnZmaWKT6DMjOzTHKBMjOzTHKBMjOzTHKBMjOzTNquBWrkyJFB7nsMvvlWqrdWc5741g5uedmuBWrNmjXbc3dmbZLzxCzHXXxmZpZJLlBmZpZJLlBmZpZJHizWzErO+vXrqaurY926dcUOpV3r1KkTFRUVlJWVbdP2LlBmVnLq6uro2rUrvXv3Jhk/1raziGDt2rXU1dVRWVm5TW24i8/MSs66devYc889XZyKSBJ77rlnq85iXaCs5PUqL0dSq2+9ysuLfSjWAi5Oxdfav4G7+KzkrVi1irp9K1rdTsU7dSlEY2b58hmUmZW8tM6iW3I23aFDB6qqqjj44IM57bTT+Oijjxqee/jhh5HEq6/+Yxqo2tpaDj74YADmzJnDrrvuyqBBg+jXrx/HHHMMM2fO3Kj9yZMn079/f/r3789hhx3GvHnzGp4bOnQo/fr1o6qqiqqqKqZPn75RTPW32tra1vxaCy6vMyhJ/wP4DrkhKpYA3wbKgfuBPYCFwJkR8WmB4jQz22ZpnUXXy+dseqeddqKmpgaAM844g9tvv52LL74YgGnTpnH00Udz//338+Mf/7jJ7YcMGdJQlGpqahg9ejQ77bQTw4cPZ+bMmdxxxx3MmzePbt26sXDhQkaPHs0LL7zAPvvsA8DUqVOprq7eYkxtQbNnUJJ6ABcB1RFxMNABGANcA9wQEX2BvwBnFzJQM7O2asiQIbz22msAfPjhhzz99NPcdddd3H///XltX1VVxRVXXMHNN98MwDXXXMN1111Ht27dABg8eDBjx47llltuKcwBFEm+XXwdgZ0kdQQ6AyuBYeSmIIbcNM6j0w/PzKxt27BhA7///e855JBDAJgxYwYjR47kgAMOYI899mDhwoV5tTN48OCGLsGlS5dy6KGHbvR8dXU1S5cubXh8xhlnNHTlrV27FoCPP/64YdnJJ5+cxuEVVLNdfBHxtqRJwArgY+APwALg/YjYkKxWB/RoantJ5wDnAPTs2TONmM1KjvOk9NQXA8idQZ19dq6Tadq0aYwfPx6AMWPGMG3aNAYPHtxsexFbHwQ8Ija6aq4UuviaLVCSdgdOAiqB94GHgK81sWqTv72ImAxMBqiurs57mHWz9sR5UnqaKgZr167lySef5OWXX0YSn332GZK49tprm21v0aJFHHjggQAMGDCABQsWMGzYsIbnFy5cyIABA9I9iCLLp4vvK8AbEbE6ItYDvwGOBHZLuvwAKoB3ChSjmVlJmD59OmeddRZvvvkmtbW1vPXWW1RWVm50BV5TFi9ezJVXXsn5558PwMSJE7nkkksauu5qamqYMmUK3/3udwt+DNtTPlfxrQC+JKkzuS6+4cB8YDZwKrkr+cYCjxQqSDOz1ui5zz6pfo+tZ3KlXEtNmzaNSy+9dKNlp5xyCvfddx+XXHLJRsvnzp3LoEGD+Oijj9hrr7342c9+xvDhwwE48cQTefvttznyyCORRNeuXbn33nspL7Evk6u5fk0ASf8GfB3YACwid8l5D/5xmfki4FsR8cnW2qmuro758+e3NmazFpGU2hd188iXVg9f4DxpvWXLljV0h1lxbeFvkVee5PU9qIj4EfCjTRa/DhyWz/ZmZmYt5ZEkzMwsk1ygzMwsk1ygzMwsk1ygzMwsk1ygzMwsk1ygzKzk7VvRM9XpNvatyG84qlWrVjFmzBj69OnDgAEDGDVqFMuXL2fp0qUMGzaMAw44gL59+3LllVc2fIVhypQpXHDBBZu11bt3b9asWbPRsilTptC9e/eNptB45ZVXAFi+fDmjRo1i//3358ADD+T000/ngQceaFivS5cuDVNynHXWWcyZM4fjjz++oe0ZM2YwcOBA+vfvzyGHHMKMGTManhs3bhw9evTgk09y3yxas2YNvXv3btHfJB+esNDMSt7Kt9/i8CseTa29538ystl1IoKTTz6ZsWPHNoxaXlNTw7vvvsu4ceO47bbbGDFiBB999BGnnHIKt956a8NIES3x9a9/vWGU83rr1q3juOOO4/rrr+eEE04AYPbs2XTv3r1h+KWhQ4cyadKkhvH65syZ07D9Sy+9xIQJE5g1axaVlZW88cYbfPWrX2W//fZj4MCBQG5uqbvvvpvzzjuvxTHny2dQZmYFMHv2bMrKyjj33HMbllVVVbF8+XKOOuooRowYAUDnzp25+eabufrqq1Pb93333ccRRxzRUJwAjj322IYJEZszadIkLr/8ciorKwGorKzksssu47rrrmtYZ/z48dxwww1s2LBhS820mguUmVkBvPzyy5tNiQFNT5XRp08fPvzwQz744IMW76dxt11VVRUff/zxFvedr3ym8+jZsydHH300v/rVr7Z5P81xF5+Z2Xa06bQYjW1p+dY01cXXWk3F2NSyyy+/nBNPPJHjjjsu1f3X8xmUmVkBHHTQQSxYsKDJ5ZuOtfj666/TpUsXunbtWtB9t2T7TWNsajqP/fffn6qqKh588MFt3tfWuECZmRXAsGHD+OSTT/jFL37RsOzFF1+kb9++zJs3j8cffxzITWx40UUXMXHixNT2/c1vfpNnnnmG3/3udw3LHn30UZYsWZLX9hMmTOCnP/0ptbW1ANTW1nLVVVfx/e9/f7N1f/CDHzBp0qRU4t6Uu/jMrOSV9/hCXlfetaS95kji4YcfZvz48Vx99dV06tSJ3r17c+ONN/LII49w4YUXcv755/PZZ59x5plnbnRp+ZQpUza6rPu5554DYODAgeywQ+684vTTT2fgwIE88MADG80ndeutt3LkkUcyc+ZMxo8fz/jx4ykrK2PgwIHcdNNNeR1fVVUV11xzDSeccALr16+nrKyMa6+9tmGG4MYOOuggBg8enPfU9S2R13QbafE0AlYMnm6j/fF0G9nRmuk23MVnZmaZlKkC1au8PLVvevcqsZklzczam0x9BrVi1apUumKAVKd3NrO2Z2uXc9v20dqPkDJ1BmVmloZOnTqxdu3aVv+DtG0XEaxdu5ZOnTptcxuZOoMyM0tDRUUFdXV1rF69utihtGudOnWiomLbe8VcoMys5JSVlTWMI2dtl7v4zMwsk1ygzMwsk1ygzMwsk1ygzMwsk1ygzMwsk/IqUJJ2kzRd0quSlkk6QtIekmZJ+nPyc/dCB2tmZu1HvmdQNwGPRkR/4J+BZcClwBMR0Rd4InlsZmaWimYLlKRdgGOAuwAi4tOIeB84CbgnWe0eYHShgjQzs/YnnzOo/YDVwC8lLZJ0p6Sdgb0jYiVA8nOvpjaWdI6k+ZLm+1vdZk1znphtLp8C1REYDNwWEYOAv9OC7ryImBwR1RFR3b17920M06y0OU/MNpdPgaoD6iLi+eTxdHIF611J5QDJz/cKE6KZmbVHzRaoiFgFvCWpX7JoOPAK8FtgbLJsLPBIQSI0M7N2Kd/BYi8EpkraEXgd+Da54vagpLOBFcBphQnRrHXUoSyV+cHUoSyFaMwsX3kVqIioAaqbeGp4uuGYpS8+W8/hVzza6nae/8nIFKIxs3x5JAkzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8skFygzM8ukvAuUpA6SFkmamTyulPS8pD9LekDSjoUL08zM2puWnEF9D1jW6PE1wA0R0Rf4C3B2moGZmVn7lleBklQBHAfcmTwWMAyYnqxyDzC6EAGamVn7lO8Z1I3ARODz5PGewPsRsSF5XAf0aGpDSedImi9p/urVq1sVrFmpcp6Yba7ZAiXpeOC9iFjQeHETq0ZT20fE5Iiojojq7t27b2OYZqXNeWK2uY55rHMUcKKkUUAnYBdyZ1S7SeqYnEVVAO8ULkwzM2tvmj2DiojLIqIiInoDY4AnI+IMYDZwarLaWOCRgkVpZmbtTmu+B3UJcLGk18h9JnVXOiGZmZnl18XXICLmAHOS+68Dh6UfkpmZmUeSMDOzjHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKBMjOzTHKB2o56lZcjKZVbr/LyYh+OmVlBtWg+KGudFatWUbdvRSptVbxTl0o7ZmZZ5TMoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLJBcoMzPLpGYLlKQvSJotaZmkpZK+lyzfQ9IsSX9Ofu5e+HDNzKy9yOcMagPw/Yg4EPgScL6kAcClwBMR0Rd4InlsZmaWimYLVESsjIiFyf2/AcuAHsBJwD3JavcAowsVpJmZtT8t+gxKUm9gEPA8sHdErIRcEQP22sI250iaL2n+6tWrWxetWYlynphtLu8CJakL8GtgfER8kO92ETE5Iqojorp79+7bEqNZyXOemG0urwIlqYxccZoaEb9JFr8rqTx5vhx4rzAhmplZe5TPVXwC7gKWRcT1jZ76LTA2uT8WeCT98MzMrL3KZ8LCo4AzgSWSapJllwNXAw9KOhtYAZxWmBDNzKw9arZARcQ8QFt4eni64ZiZWTH0Ki9nxapVqbTVc599eHPlyla34ynfzcyMFatWUbdvRSptVbxTl0o7HurIMqlXeTmSUrmVorR+P73Ky4t9KGZb5DMoy6QsvpvLkrR+P6X4u7HS4TMoMzPLpJI9g/onSK17J60P/Cx/6lDmd/dm7VzJFqhPwF1EbVh8tp7Dr3g0lbae/8nIVNoxs+3LXXxmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJLlBmZpZJJTvUkZmZ5S/N8S/VoSyVdlygzMwsk+NfuovPrB2rH/Xfkx9aFvkMyqwd86j/lmU+gzIzs0xygbLU7FvRM7XuIjMzd/FZala+/VbmPmQ1s7YrUwUqi5c5mtn216u8nBWrVrW6nZ777MObK1emEJEVQ6YKVBYvc8yq+quv0uAktqxZsWpVKhdv+MKNtq1VBUrSSOAmoANwZ0RcnUpU1ixffWVmpW6bL5KQ1AG4BfgaMAD4hqQBaQVmZtZaWf2eV6/y8lRi6tyhY0lfmNSaM6jDgNci4nUASfcDJwGvpBGYmVlrZbWnIc0uzCweX1oUEdu2oXQqMDIivpM8PhM4PCIu2GS9c4Bzkof9gD9tpdluwJptCqht8PG1bfkc35qIaPEHoC3Mk3xjact8fG1bc8eXV5605gyqqXPCzapdREwGJufVoDQ/IqpbEVOm+fjatkIeX0vypNCxZIGPr21L6/ha80XdOuALjR5XAO+0LhwzM7Oc1hSoF4G+kiol7QiMAX6bTlhmZtbebXMXX0RskHQB8Bi5y8zvjoilrYwn7y6ONsrH17Zl6fiyFEsh+PjatlSOb5svkjAzMyskDxZrZmaZ5AJlZmaZlJkCJWmkpD9Jek3SpcWOJ02SviBptqRlkpZK+l6xY0qbpA6SFkmaWexYCkHSbpKmS3o1+TseUaQ4nCdtXCnnStp5konPoJJhk5YDXyV3+fqLwDcioiRGpZBUDpRHxEJJXYEFwOhSOT4ASRcD1cAuEXF8seNJm6R7gLkRcWdy1WrniHh/O8fgPCkBpZwraedJVs6gGoZNiohPgfphk0pCRKyMiIXJ/b8By4AexY0qPZIqgOOAO4sdSyFI2gU4BrgLICI+3d7FKeE8aeNKOVcKkSdZKVA9gLcaPa6jxF6Y9ST1BgYBzxc3klTdCEwEPi92IAWyH7Aa+GXSNXOnpJ2LEIfzpO0r5VxJPU+yUqDyGjaprZPUBfg1MD4iPih2PGmQdDzwXkQsKHYsBdQRGAzcFhGDgL8Dxfj8x3nShrWDXEk9T7JSoEp+2CRJZeSSbmpE/KbY8aToKOBESbXkupyGSbq3uCGlrg6oi4j6d/PTySViMeJwnrRdpZ4rqedJVgpUSQ+bpNxkK3cByyLi+mLHk6aIuCwiKiKiN7m/25MR8a0ih5WqiFgFvCWpX7JoOMWZVsZ50oaVeq4UIk8yMeV7gYZNypKjgDOBJZJqkmWXR8R/FjEma5kLgalJYXgd+Pb2DsB5Ym1AqnmSicvMzczMNpWVLj4zM7ONuECZmVkmuUCZmVkmuUCZmVkmuUCZmVkmuUBlgKQfS5qQYnv9JdUkw430SavdRu3PkVSddrtmW+M8aX9coErTaOCRiBgUEf9V7GDMMsp5knEuUEUi6QfJvD6PA/2SZf8q6UVJL0n6taTOkrpKeiMZAgZJu0iqlVQmqUrSc5IWS3pY0u6SRgHjge8kc+tMlHRRsu0Nkp5M7g+vH2ZF0ghJz0paKOmhZCw0JB0q6Y+SFkh6LJkOofEx7CDpHkn/e7v94qxdcZ60by5QRSDpUHJDnQwC/gX4YvLUbyLiixHxz+SmGjg7mXZgDrkh+km2+3VErAf+HbgkIgYCS4AfJd+6vx24ISKOBZ4ChiTbVgNdkiQ+GpgrqRvwQ+ArETEYmA9cnKzzc+DUiDgUuBv4P40OoyMwFVgeET9M8ddjBjhPLCNDHbVDQ4CHI+IjAEn146kdnLzL2g3oQm5IG8jNHTMRmEFu6JB/lbQrsFtE/DFZ5x7goSb2tQA4VLkJ4D4BFpJLwCHARcCXgAHA07mh0NgReJbcu9WDgVnJ8g7Aykbt3gE8GBGNk9EsTc6Tds4FqniaGmNqCrkZRF+SNA4YChART0vqLenLQIeIeDlJvOZ3ErFeudGTvw08AywGjgX6kHv32QeYFRHfaLydpEOApRGxpSmbnwGOlfR/I2JdPrGYbQPnSTvmLr7ieAo4WdJOyTu2E5LlXYGVSbfBGZts8+/ANOCXABHxV+Avkuq7Jc4E/kjTngImJD/nAucCNZEbiPE54ChJ+wMk/fkHAH8Cuks6IlleJumgRm3eBfwn8JAkv9GxQnCetHMuUEWQTGv9AFBDbu6buclT/4vcDKKzgFc32WwqsDu55Ks3FrhO0mKgCvjJFnY5FygHno2Id4F19fuMiNXAOGBa0s5zQP9kSvFTgWskvZTEeuQmx3E9ua6QX0nya8lS5Twxj2beRkg6FTgpIs4sdixmWeU8KS0+5WwDJP0c+BowqtixmGWV86T0+AzKzMwyyf2hZmaWSS5QZmaWSS5QZmaWSS5QZmaWSS5QZmaWSf8feZ3K8s9z83MAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x216 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df['dayofweek'] = df['effective_date'].dt.dayofweek\n",
    "bins = np.linspace(df.dayofweek.min(), df.dayofweek.max(), 10)\n",
    "g = sns.FacetGrid(df, col=\"Gender\", hue=\"loan_status\", palette=\"Set1\", col_wrap=2)\n",
    "g.map(plt.hist, 'dayofweek', bins=bins, ec=\"k\")\n",
    "g.axes[-1].legend()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "We see that people who get the loan at the end of the week dont pay it off, so lets use Feature binarization to set a threshold values less then day 4 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "'dayofweek'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m   2896\u001b[0m             \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2897\u001b[0;31m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2898\u001b[0m             \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;31mKeyError\u001b[0m: 'dayofweek'",
      "\nDuring handling of the above exception, another exception occurred:\n",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-10-255714554ca9>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'weekend'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'dayofweek'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m>\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m  \u001b[0;32melse\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m   2978\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnlevels\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2979\u001b[0m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_multilevel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2980\u001b[0;31m             \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2981\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mindexer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2982\u001b[0m                 \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mindexer\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/conda/envs/python/lib/python3.6/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m   2897\u001b[0m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2898\u001b[0m             \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2899\u001b[0;31m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_maybe_cast_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2900\u001b[0m         \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtolerance\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtolerance\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2901\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msize\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;31mKeyError\u001b[0m: 'dayofweek'"
     ]
    }
   ],
   "source": [
    "df['weekend'] = df['dayofweek'].apply(lambda x: 1 if (x>3)  else 0)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "## Convert Categorical features to numerical values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "Lets look at gender:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Gender  loan_status\n",
       "female  PAIDOFF        0.865385\n",
       "        COLLECTION     0.134615\n",
       "male    PAIDOFF        0.731293\n",
       "        COLLECTION     0.268707\n",
       "Name: loan_status, dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['Gender'])['loan_status'].value_counts(normalize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "86 % of female pay there loans while only 73 % of males pay there loan\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "Lets convert male to 0 and female to 1:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>Unnamed: 0.1</th>\n",
       "      <th>loan_status</th>\n",
       "      <th>Principal</th>\n",
       "      <th>terms</th>\n",
       "      <th>effective_date</th>\n",
       "      <th>due_date</th>\n",
       "      <th>age</th>\n",
       "      <th>education</th>\n",
       "      <th>Gender</th>\n",
       "      <th>dayofweek</th>\n",
       "      <th>weekend</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-08</td>\n",
       "      <td>2016-10-07</td>\n",
       "      <td>45</td>\n",
       "      <td>High School or Below</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-08</td>\n",
       "      <td>2016-10-07</td>\n",
       "      <td>33</td>\n",
       "      <td>Bechalor</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>15</td>\n",
       "      <td>2016-09-08</td>\n",
       "      <td>2016-09-22</td>\n",
       "      <td>27</td>\n",
       "      <td>college</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-09</td>\n",
       "      <td>2016-10-08</td>\n",
       "      <td>28</td>\n",
       "      <td>college</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>6</td>\n",
       "      <td>6</td>\n",
       "      <td>PAIDOFF</td>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>2016-09-09</td>\n",
       "      <td>2016-10-08</td>\n",
       "      <td>29</td>\n",
       "      <td>college</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  Unnamed: 0.1 loan_status  Principal  terms effective_date  \\\n",
       "0           0             0     PAIDOFF       1000     30     2016-09-08   \n",
       "1           2             2     PAIDOFF       1000     30     2016-09-08   \n",
       "2           3             3     PAIDOFF       1000     15     2016-09-08   \n",
       "3           4             4     PAIDOFF       1000     30     2016-09-09   \n",
       "4           6             6     PAIDOFF       1000     30     2016-09-09   \n",
       "\n",
       "    due_date  age             education  Gender  dayofweek  weekend  \n",
       "0 2016-10-07   45  High School or Below       0          3        0  \n",
       "1 2016-10-07   33              Bechalor       1          3        0  \n",
       "2 2016-09-22   27               college       0          3        0  \n",
       "3 2016-10-08   28               college       1          4        1  \n",
       "4 2016-10-08   29               college       0          4        1  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Gender'].replace(to_replace=['male','female'], value=[0,1],inplace=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "## One Hot Encoding  \n",
    "#### How about education?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "education             loan_status\n",
       "Bechalor              PAIDOFF        0.750000\n",
       "                      COLLECTION     0.250000\n",
       "High School or Below  PAIDOFF        0.741722\n",
       "                      COLLECTION     0.258278\n",
       "Master or Above       COLLECTION     0.500000\n",
       "                      PAIDOFF        0.500000\n",
       "college               PAIDOFF        0.765101\n",
       "                      COLLECTION     0.234899\n",
       "Name: loan_status, dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['education'])['loan_status'].value_counts(normalize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Feature befor One Hot Encoding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Principal</th>\n",
       "      <th>terms</th>\n",
       "      <th>age</th>\n",
       "      <th>Gender</th>\n",
       "      <th>education</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "      <td>High School or Below</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>33</td>\n",
       "      <td>1</td>\n",
       "      <td>Bechalor</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1000</td>\n",
       "      <td>15</td>\n",
       "      <td>27</td>\n",
       "      <td>0</td>\n",
       "      <td>college</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>28</td>\n",
       "      <td>1</td>\n",
       "      <td>college</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>29</td>\n",
       "      <td>0</td>\n",
       "      <td>college</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Principal  terms  age  Gender             education\n",
       "0       1000     30   45       0  High School or Below\n",
       "1       1000     30   33       1              Bechalor\n",
       "2       1000     15   27       0               college\n",
       "3       1000     30   28       1               college\n",
       "4       1000     30   29       0               college"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Principal','terms','age','Gender','education']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "#### Use one hot encoding technique to conver categorical varables to binary variables and append them to the feature Data Frame "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Principal</th>\n",
       "      <th>terms</th>\n",
       "      <th>age</th>\n",
       "      <th>Gender</th>\n",
       "      <th>weekend</th>\n",
       "      <th>Bechalor</th>\n",
       "      <th>High School or Below</th>\n",
       "      <th>college</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>33</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1000</td>\n",
       "      <td>15</td>\n",
       "      <td>27</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>28</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>29</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Principal  terms  age  Gender  weekend  Bechalor  High School or Below  \\\n",
       "0       1000     30   45       0        0         0                     1   \n",
       "1       1000     30   33       1        0         1                     0   \n",
       "2       1000     15   27       0        0         0                     0   \n",
       "3       1000     30   28       1        1         0                     0   \n",
       "4       1000     30   29       0        1         0                     0   \n",
       "\n",
       "   college  \n",
       "0        0  \n",
       "1        0  \n",
       "2        1  \n",
       "3        1  \n",
       "4        1  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Feature = df[['Principal','terms','age','Gender','weekend']]\n",
    "Feature = pd.concat([Feature,pd.get_dummies(df['education'])], axis=1)\n",
    "Feature.drop(['Master or Above'], axis = 1,inplace=True)\n",
    "Feature.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Feature selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "Lets defind feature sets, X:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Principal</th>\n",
       "      <th>terms</th>\n",
       "      <th>age</th>\n",
       "      <th>Gender</th>\n",
       "      <th>weekend</th>\n",
       "      <th>Bechalor</th>\n",
       "      <th>High School or Below</th>\n",
       "      <th>college</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>45</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>33</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1000</td>\n",
       "      <td>15</td>\n",
       "      <td>27</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>28</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1000</td>\n",
       "      <td>30</td>\n",
       "      <td>29</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Principal  terms  age  Gender  weekend  Bechalor  High School or Below  \\\n",
       "0       1000     30   45       0        0         0                     1   \n",
       "1       1000     30   33       1        0         1                     0   \n",
       "2       1000     15   27       0        0         0                     0   \n",
       "3       1000     30   28       1        1         0                     0   \n",
       "4       1000     30   29       0        1         0                     0   \n",
       "\n",
       "   college  \n",
       "0        0  \n",
       "1        0  \n",
       "2        1  \n",
       "3        1  \n",
       "4        1  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = Feature\n",
    "X[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "What are our lables?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF'], dtype=object)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y = df['loan_status'].values\n",
    "y[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "## Normalize Data "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "Data Standardization give data zero mean and unit variance (technically should be done after train test split )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.51578458,  0.92071769,  2.33152555, -0.42056004, -1.20577805,\n",
       "        -0.38170062,  1.13639374, -0.86968108],\n",
       "       [ 0.51578458,  0.92071769,  0.34170148,  2.37778177, -1.20577805,\n",
       "         2.61985426, -0.87997669, -0.86968108],\n",
       "       [ 0.51578458, -0.95911111, -0.65321055, -0.42056004, -1.20577805,\n",
       "        -0.38170062, -0.87997669,  1.14984679],\n",
       "       [ 0.51578458,  0.92071769, -0.48739188,  2.37778177,  0.82934003,\n",
       "        -0.38170062, -0.87997669,  1.14984679],\n",
       "       [ 0.51578458,  0.92071769, -0.3215732 , -0.42056004,  0.82934003,\n",
       "        -0.38170062, -0.87997669,  1.14984679]])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X= preprocessing.StandardScaler().fit(X).transform(X)\n",
    "X[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "# Classification "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "Now, it is your turn, use the training set to build an accurate model. Then use the test set to report the accuracy of the model\n",
    "You should use the following algorithm:\n",
    "- K Nearest Neighbor(KNN)\n",
    "- Decision Tree\n",
    "- Support Vector Machine\n",
    "- Logistic Regression\n",
    "\n",
    "\n",
    "\n",
    "__ Notice:__ \n",
    "- You can go above and change the pre-processing, feature selection, feature-extraction, and so on, to make a better model.\n",
    "- You should use either scikit-learn, Scipy or Numpy libraries for developing the classification algorithms.\n",
    "- You should include the code of the algorithm in the following cells."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# K Nearest Neighbor(KNN)\n",
    "Notice: You should find the best k to build the model with the best accuracy.  \n",
    "**warning:** You should not use the __loan_test.csv__ for finding the best k, however, you can split your train_loan.csv into train and test to find the best __k__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Decision Tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Support Vector Machine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Logistic Regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model Evaluation using Test set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import jaccard_similarity_score\n",
    "from sklearn.metrics import f1_score\n",
    "from sklearn.metrics import log_loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, download and load the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget -O loan_test.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/loan_test.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "### Load Test set for evaluation "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "button": false,
    "collapsed": true,
    "deletable": true,
    "jupyter": {
     "outputs_hidden": true
    },
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "outputs": [],
   "source": [
    "test_df = pd.read_csv('loan_test.csv')\n",
    "test_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Report\n",
    "You should be able to report the accuracy of the built model using different evaluation metrics:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Algorithm          | Jaccard | F1-score | LogLoss |\n",
    "|--------------------|---------|----------|---------|\n",
    "| KNN                | ?       | ?        | NA      |\n",
    "| Decision Tree      | ?       | ?        | NA      |\n",
    "| SVM                | ?       | ?        | NA      |\n",
    "| LogisticRegression | ?       | ?        | ?       |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "deletable": true,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    }
   },
   "source": [
    "<h2>Want to learn more?</h2>\n",
    "\n",
    "IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: <a href=\"http://cocl.us/ML0101EN-SPSSModeler\">SPSS Modeler</a>\n",
    "\n",
    "Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at <a href=\"https://cocl.us/ML0101EN_DSX\">Watson Studio</a>\n",
    "\n",
    "<h3>Thanks for completing this lesson!</h3>\n",
    "\n",
    "<h4>Author:  <a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a></h4>\n",
    "<p><a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.</p>\n",
    "\n",
    "<hr>\n",
    "\n",
    "<p>Copyright &copy; 2018 <a href=\"https://cocl.us/DX0108EN_CC\">Cognitive Class</a>. This notebook and its source code are released under the terms of the <a href=\"https://bigdatauniversity.com/mit-license/\">MIT License</a>.</p>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python",
   "language": "python",
   "name": "conda-env-python-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
